This notebook will cover some initial findings from the exploratory data analysis of the Yelp brewery data.

yelp <- read.csv('../data/cleanYelp.csv')
yelp

Distribution of dependent variable

These histograms compare the distribution of ratings directly from Yelp against the ones I calculated myself (via a weighted average).

yelp %>% 
  gather(rating_type, rating, ratings, my_ratings) %>% 
  mutate(rating_type = fct_recode(rating_type, 'Yelp Ratings' = 'ratings', 'My Converted Ratings' = 'my_ratings')) %>% 
  ggplot(aes(x = rating)) + geom_density(fill = 'blue', alpha = .8) + xlab('Ratings') + facet_wrap(~rating_type) + 
  theme_bw()

We can see that Yelp rounds its ratings to the nearest 0.5, and calculating ratings by hand preserves the variability.

Findings

Price

Do people prefer cheaper breweries to more moderately-priced ones?

Sure, all things equal, it’s better to spend less money than more. But people often use price to infer quality: if something is expensive, it must be good. Do higher prices lead to higher ratings?

yelp %>% 
  filter(price_range != 'expensive') %>% 
  group_by(price_range) %>% 
  summarize(ratings = mean(my_ratings), se = sd(my_ratings) / sqrt(n())) %>% 
  ggplot(aes(x = price_range, y = ratings)) + geom_bar(stat='identity', width = .6) + geom_errorbar(aes(ymin = ratings - se, ymax = ratings + se), width = .5) + ylim(0,5) + theme_bw() + 
  ylab('Average Rating') + xlab('Price Range')

It looks like cheaper prices drive up ratings. Maybe people aren’t fooled by the illusion of price indicating quality, maybe they just know good beer and prefer it to be cheap. It’s also possible that, all things equal, people just like things that are cheap!

Can anything else explain how price range affects average rating? Maybe people only have a strong preference about the price range of a brewery when it doesn’t accept credit cards. It can be an extra hassle when scraping together the spare change in your car just isn’t enough to buy a refreshing drink.

options(warn = -1)
yelp %>% 
  filter(price_range != 'expensive') %>% 
  group_by(price_range, accepts_credit_cards) %>% 
  summarize(ratings = mean(my_ratings), se = sd(my_ratings, na.rm = TRUE) / sqrt(n())) %>% 
 ggplot(aes(x = price_range, y = ratings, group = factor(accepts_credit_cards))) + geom_bar(stat = 'identity', aes(fill = factor(accepts_credit_cards)), position = position_dodge(width = .9)) +
 geom_errorbar(aes(ymin = ratings - se, ymax = ratings + se), position = position_dodge(width = .9), width = .5) + ylim(0,5) + theme_bw() + 
  xlab('Price Range') + ylab('Average Rating') + 
  theme(legend.position = 'top', axis.title.x = element_text(margin = margin(t = 10))) +
 guides(fill = guide_legend(title='Does the brewery accept credit cards?')) + scale_fill_manual(values = c('black', 'light grey'))

Sure enough, it looks like people are only opposed to spending a little bit more when plastic is a no-go. Get rid of those Cash Only signs!

Take Out

Take out: Good or bad?

More take out seems better than less take out, right? Perhaps not, say the data:

## Group breweries per capita
yelp <- yelp %>% 
  group_by(state) %>% 
  summarize(breweriesToPpl = max(breweriesToPpl)) %>% 
  arrange(breweriesToPpl) %>% 
  mutate(rank = 1:(nrow(.))) %>% 
  mutate(breweriesToPplFactor = factor(ifelse(rank < 50/3, 'Low', ifelse(rank > 100/3, 'High', 'Medium')))) %>% 
  select(state, breweriesToPplFactor) %>% 
  inner_join(yelp, by = 'state')
yelp %>% 
  group_by(take_out) %>% 
  summarize(ratings = mean(my_ratings), se = sd(my_ratings) / sqrt(n())) %>% 
  ggplot(aes(x = take_out, y = ratings)) + geom_bar(stat='identity', width = .6) + geom_errorbar(aes(ymin = ratings - se, ymax = ratings + se), width = .5) + ylim(0,5) + theme_bw() + 
  xlab('Does the brewery have take out?') + ylab('Average Rating')

Breweries without take out are rated more highly than those that do. This seems puzzling. The first thing to note is that places that do have take out are actually still doing quite well (ie, 4/5 stars). Why are they being edged out by breweries with no take out? The few brewery crazies I know in my life don’t love a certain brewery because of its take out, they love it because of the atmosphere and how great it is to sit back and drink a beer. If you’re main product is beer, then what is it that you’re actually selling to go? Food? When thinking about it this way, maybe breweries without take out have closer-knit cultures, leading people to rate those breweries more favorably.

Let’s say an owner has to have take out at their brewery—what are ways to make take out better for a breweries? Well, what if you wanted to do more to attract passer-bys on the street? If someone is in a hurry but sees a brewery offering take out, what feature would make them more likely to stop in and have an enjoyable experience? If it’s a densely-populated community, it’s likely that people are relying on bicycles for transportation. If breweries made it easy for cyclists to pop in and out, that might enhance the sense of community around the brewery and boost its ratings.

yelp %>% 
  group_by(take_out, bike_parking) %>% 
  summarize(ratings = mean(my_ratings), se = sd(my_ratings) / sqrt(n())) %>% 
  ggplot(aes(x = take_out, y = ratings, fill = bike_parking)) + geom_bar(stat='identity', aes(fill = bike_parking), position = position_dodge(width = .9)) + 
  geom_errorbar(aes(ymin = ratings - se, ymax = ratings + se), width = .5, position = position_dodge(width = .9)) + ylim(0,5) + theme_bw() + 
  xlab('Does the brewery have take out?') + ylab('Average Rating') + 
  scale_fill_manual(values = c('black', 'light grey'), name = 'Bike Parking Available') +
  theme(legend.position = 'top')

Offering make parking makes take out more desirable. The effect looks slight, but, if an owner had to have take out at their restaraunt, I might recommend that they have a bike rack or two outside the brewery to welcome passer-bys (assuming the brewery is in a city where it makes sense to do so).

Do breweries do better in some areas than others? Since we’ve been talking about how culture influences brewery success, maybe some areas have more brewery culture than others. Vermont and Colorado come to mind specifically when thinking about strong brewery culture, while southern states maybe come to mind when thinking about where breweries might be less popular. We can get a sense of brewery popularity by state by contrasting number of breweries against population for a given state:

yelp %>% 
  group_by(state) %>% 
  summarize(breweriesToPpl = max(breweriesToPpl)) %>% 
  ggplot(aes(x = reorder(state, breweriesToPpl, sum), y = breweriesToPpl, group = 1)) + geom_line() + coord_flip() + xlab('State (increasing in ratio of breweries to people)') +
  ylab('Breweries Per Capita') + theme_bw() + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) 

Maybe it doesn’t come as a surprise that places like Vermont and Colorado are more popular for breweries than a place like Georgia. How can breweries in less popular areas be more successful? It’s possible that offering bike racks outside breweries in an area where breweries aren’t quite as popular might help foster that sense of culture.

yelp <- yelp %>% 
  mutate(breweriesToPplFactor = factor(breweriesToPplFactor, levels = levels(yelp$breweriesToPplFactor)[c(2,3,1)]),
         take_out = recode(take_out, 'Yes' = 'Take Out Yes', 'No' = 'Take Out No')) 
yelp %>% 
  group_by(breweriesToPplFactor, bike_parking, take_out) %>% 
  summarize(rating = mean(my_ratings), se = sd(my_ratings) / sqrt(n())) %>% 
  ggplot(aes(x = breweriesToPplFactor, y = rating, group = bike_parking)) + 
  geom_bar(stat = 'identity', aes(fill = bike_parking), position = position_dodge(width = 0.9)) + 
  geom_errorbar(aes(ymin = rating - se, ymax = rating + se), position = position_dodge(width = .9), width = .5) + 
  facet_wrap(~take_out) + 
  ylim(0,5) + 
  scale_fill_manual(values = c('black','light grey'), name = 'Bike Parking Available') + 
  theme_bw() + 
  xlab('Breweries Per Capita') + 
  ylab('Yelp Rating') + 
  theme(legend.position = 'top',
        strip.background = element_rect(fill = 'white', color = 'black'))

Offering bicycle parking is one way for breweries in less popular areas to gain an advantage. It’s interesting to note that, across the board, no take out is still associated with higher ratings than breweries with take out. And, if you’re going to offer take out in an area where breweries are popular, it doesn’t make much of a difference whether you have bike parking or not. But for areas where breweries are not that popular, offering bike parking might be one way for an owner to increase the success of his or her own brewery.

 

A work by Dave Braun

dab414@lehigh.edu